Spatial Data Mining Implementation: Alternatives and Performances
نویسندگان
چکیده
Spatial data mining requires the analysis of the interactions in space. These interactions can be materialized using distance tables, reducing spatial data mining to multi-table analysis. However, conventional data mining algorithms consider only one input table where each row is an observation to analyze. Simple relational joins between these tables does not resolve the problem and mislead the results because of the multiple counting of observations. We propose three alternatives of multi-table data mining in the context of spatial data mining. The first makes a hard modification in the conventional algorithm in order to consider those tables. The second is an optimization of the first approach. It pre-computes all join operations and adapts the conventional algorithm. The third re-organizes data into a unique table by completing -not joiningthe target table using the existing data in the other tables, then applies any standard data mining algorithm without modification. This article presents these three alternatives. It describes their implementation for classification algorithms and compares their performances.
منابع مشابه
Mise en oeuvre des méthodes de fouille de données spatiales - Alternatives et performances
Spatial data mining requires the analysis of their interactions in the space. These interactions can be materialized using distance tables, reducing spatial data mining to multi-table analysis. However, conventional data mining algorithms consider only one input table with one observation by row. Normal joins between these tables doesn’t resolve the problem and mislead the results because of th...
متن کاملDetecting Diseases in Medical Prescriptions Using Data Mining Tools and Combining Techniques
Data about the prevalence of communicable and non-communicable diseases, as one of the most important categories of epidemiological data, is used for interpreting health status of communities. This study aims to calculate the prevalence of outpatient diseases through the characterization of outpatient prescriptions. The data used in this study is collected from 1412 prescriptions for various ty...
متن کاملDetecting Diseases in Medical Prescriptions Using Data Mining Tools and Combining Techniques
Data about the prevalence of communicable and non-communicable diseases, as one of the most important categories of epidemiological data, is used for interpreting health status of communities. This study aims to calculate the prevalence of outpatient diseases through the characterization of outpatient prescriptions. The data used in this study is collected from 1412 prescriptions for various ty...
متن کاملParallel Spatial Pyramid Match Kernel Algorithm for Object Recognition using a Cluster of Computers
This paper parallelizes the spatial pyramid match kernel (SPK) implementation. SPK is one of the most usable kernel methods, along with support vector machine classifier, with high accuracy in object recognition. MATLAB parallel computing toolbox has been used to parallelize SPK. In this implementation, MATLAB Message Passing Interface (MPI) functions and features included in the toolbox help u...
متن کاملGRID Oriented Implementation of Self-organizing Maps for Data Mining in Meteorology
We study the efficiency of different alternatives for a scalable parallel implementation of the self-organizing map (SOM) in the GRID enviroment of variable resources and communications. In particular, we consider an application of data mining in Meteorology, which involves databases of high-dimensional atmospheric patterns. In this work, we focus in network partitioning alternatives, analyzing...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2004